Distributed Tracing

What You Will Learn

The tracing data model: traces, spans, context, and baggage
How to set up OpenTelemetry for Python with Jaeger as the backend
Auto-instrumentation for FastAPI, SQLAlchemy, Redis, and HTTPX
How to create custom spans for business logic and pipelines
How W3C trace context propagates through HTTP, and how to do it manually through Kafka
Sampling strategies: when and how to sample traces in production
How to inject trace IDs into log lines to correlate logs and traces

Prerequisites

Requirement	Details
Python 3.11+	`asyncio` used throughout
FastAPI + SQLAlchemy + Redis	Auto-instrumentation targets
`opentelemetry-sdk` and related packages	Full install command below
Jaeger	Runs in docker-compose
Lessons 01 and 02 complete	Logging and metrics context assumed

pip install \
  opentelemetry-api \
  opentelemetry-sdk \
  opentelemetry-exporter-otlp-proto-grpc \
  opentelemetry-instrumentation-fastapi \
  opentelemetry-instrumentation-httpx \
  opentelemetry-instrumentation-sqlalchemy \
  opentelemetry-instrumentation-redis \
  opentelemetry-instrumentation-logging

The Incident: 800ms With No Visible Cause

Three microservices. One user request. Eight hundred milliseconds of total latency.

Individual service logs:

# API Gateway
INFO  request received  duration_ms=12

# Document Service
INFO  document.fetch    duration_ms=187

# ML Service
INFO  inference.run     duration_ms=143

12ms + 187ms + 143ms = 342ms. The user experienced 800ms. Where did the other 458ms go?

Without distributed tracing, this question is unanswerable from logs alone. The gaps between services - serialisation, network transit, queue time, connection setup - are invisible.

With distributed tracing, you open Jaeger and see:

Total: 800ms
├── api-gateway: handle_request        [12ms]  ████
│   └── document-service: fetch_doc   [187ms] ██████████████████████████████████
│       ├── [WAIT: connection pool]   [89ms]  - this was the actual problem
│       ├── db: SELECT documents      [98ms]
│       └── [WAIT: network queue]     [112ms] - packets queued at the NIC
│   └── ml-service: run_inference     [143ms] ████████████████████████████████
│       └── [WAIT: model warmup]      [71ms]
├── [serialize response]              [189ms] - JSON serialisation of large doc
└── [network]                         [57ms]

The 458ms gap is now fully explained: 89ms waiting for a database connection, 112ms of network queue, 189ms of JSON serialisation, and 57ms of network transit. Three specific, actionable fixes.

1. Tracing Concepts

Trace

A trace is the complete record of a request as it travels through a distributed system. Every trace has a globally unique TraceId - a 128-bit hex string.

Span

A span is one unit of work within a trace. Every span records:

SpanId (64-bit, unique within the trace)
ParentSpanId (the span that created this one; null for the root span)
Name (e.g., "document-service: fetch_doc")
StartTime and EndTime
Status (OK or ERROR, with optional description)
Attributes (key-value pairs: http.method, db.statement, custom fields)
Events (timestamped log-like messages within the span)
Links (references to other traces - useful for async message passing)

The Timing Diagram

Time ───────────────────────────────────────────────────────────────►

Trace: abc123
│
├── Span: api-gateway/handle_request (root)       ├──────────────────────────────────────────────────┤
│
│   ├── Span: document-service/handle_request     │  ├───────────────────────────────────────┤
│   │   ├── Span: db/SELECT                       │  │   ├──────────────┤
│   │   └── Span: redis/GET                       │  │             ├─┤
│   │
│   └── Span: ml-service/run_inference            │                              ├──────────────┤
│       └── Span: model/predict                   │                                ├──────────┤

The traceparent Header

The W3C Trace Context specification defines how trace context flows between services via HTTP headers:

traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
             ^  ^                                 ^                 ^
             |  TraceId (128-bit / 32 hex chars)  |                 flags (01=sampled)
             version                              SpanId (64-bit / 16 hex chars)

When service A calls service B, it injects traceparent into the HTTP request headers. Service B extracts it, creates a new child span with that TraceId and the incoming SpanId as ParentSpanId.

Field	Size	Purpose
`version`	8-bit	Always `00` currently
`trace-id`	128-bit	Unique for the entire distributed trace
`parent-id`	64-bit	ID of the calling span (becomes ParentSpanId in child)
`trace-flags`	8-bit	`01` = sampled, `00` = not sampled

2. OpenTelemetry Python Setup

OpenTelemetry (OTel) is the vendor-neutral standard for distributed tracing (and now also metrics and logs). It replaces older systems like OpenCensus and OpenTracing.

Architecture

Python Service
┌─────────────────────────────────────┐
│  TracerProvider                     │
│  ├── Sampler (decides what to trace)│
│  ├── SpanProcessor                  │
│  │   └── BatchSpanProcessor         │
│  │       └── OTLPSpanExporter ─────────────► OpenTelemetry Collector
│  └── Resource (service metadata)    │                    │
└─────────────────────────────────────┘                    │ OTLP
                                                           ▼
                                                         Jaeger
                                                     (trace storage + UI)

Full Setup Module

# app/tracing.py
"""
OpenTelemetry tracing setup.

Call setup_tracing() once at application startup, before any
instrumentation libraries are initialised.
"""
import os
from typing import Optional
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import (
    BatchSpanProcessor,
    ConsoleSpanExporter,
)
from opentelemetry.sdk.trace.sampling import (
    ALWAYS_ON,
    TraceIdRatioBased,
    ParentBased,
)
from opentelemetry.sdk.resources import (
    Resource,
    SERVICE_NAME,
    SERVICE_VERSION,
    DEPLOYMENT_ENVIRONMENT,
)
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.instrumentation.httpx import HTTPXClientInstrumentor
from opentelemetry.instrumentation.sqlalchemy import SQLAlchemyInstrumentor
from opentelemetry.instrumentation.redis import RedisInstrumentor
from opentelemetry.instrumentation.logging import LoggingInstrumentor


def setup_tracing(
    service_name: str,
    service_version: str,
    environment: str,
    otlp_endpoint: str = "http://localhost:4317",
    sample_rate: float = 1.0,
    console_export: bool = False,
) -> TracerProvider:
    """
    Initialise OpenTelemetry tracing for a FastAPI service.

    Args:
        service_name: Name of this service (e.g., "document-api")
        service_version: Version string (e.g., "2.14.0")
        environment: Deployment environment (e.g., "production")
        otlp_endpoint: OTLP gRPC endpoint for the collector/Jaeger
        sample_rate: Fraction of traces to sample (1.0 = all, 0.1 = 10%)
        console_export: Also print spans to stdout (useful for debugging)

    Returns:
        The configured TracerProvider (also set as global)
    """
    # Resource: metadata attached to every span from this service
    resource = Resource.create({
        SERVICE_NAME: service_name,
        SERVICE_VERSION: service_version,
        DEPLOYMENT_ENVIRONMENT: environment,
        "host.name": os.uname().nodename,
        "process.pid": os.getpid(),
    })

    # Sampler: ParentBased respects the sampling decision of the upstream service
    # If the upstream sampled the trace, we continue sampling it.
    # If upstream did not sample, we apply our own rate.
    if sample_rate >= 1.0:
        sampler = ALWAYS_ON
    else:
        sampler = ParentBased(root=TraceIdRatioBased(sample_rate))

    # Provider: the central object that creates tracers
    provider = TracerProvider(
        resource=resource,
        sampler=sampler,
    )

    # Exporter: sends spans to the backend
    otlp_exporter = OTLPSpanExporter(endpoint=otlp_endpoint)
    provider.add_span_processor(
        BatchSpanProcessor(
            otlp_exporter,
            max_queue_size=2048,
            max_export_batch_size=512,
            schedule_delay_millis=5000,  # Export every 5 seconds
            export_timeout_millis=10000,
        )
    )

    if console_export:
        provider.add_span_processor(
            BatchSpanProcessor(ConsoleSpanExporter())
        )

    # Set as the global provider - all calls to trace.get_tracer() use this
    trace.set_tracer_provider(provider)

    return provider


def instrument_app(app, engine=None, redis_client=None) -> None:
    """
    Apply auto-instrumentation to the FastAPI app and its dependencies.

    Call AFTER setup_tracing() and BEFORE the app starts handling requests.
    """
    # FastAPI: instruments all routes, adds span for each request
    FastAPIInstrumentor.instrument_app(
        app,
        server_request_hook=_server_request_hook,
        client_request_hook=None,
        client_response_hook=None,
    )

    # HTTPX: instruments all outbound HTTP calls made with httpx
    HTTPXClientInstrumentor().instrument(
        request_hook=_outbound_request_hook,
        response_hook=_outbound_response_hook,
    )

    # SQLAlchemy: instruments all database queries
    if engine is not None:
        SQLAlchemyInstrumentor().instrument(
            engine=engine,
            enable_commenter=True,  # Adds trace ID comment to SQL queries
        )

    # Redis: instruments all redis operations
    if redis_client is not None:
        RedisInstrumentor().instrument()

    # Logging: injects trace_id and span_id into stdlib log records
    # This enables log-to-trace correlation without manual processor code
    LoggingInstrumentor().instrument(set_logging_format=True)


def _server_request_hook(span, scope):
    """Add custom attributes to every inbound request span."""
    if span and span.is_recording():
        # Add request metadata from ASGI scope
        if "headers" in scope:
            headers = dict(scope["headers"])
            if b"x-user-id" in headers:
                span.set_attribute("user.id", headers[b"x-user-id"].decode())


def _outbound_request_hook(span, request):
    """Add custom attributes to every outbound HTTP request span."""
    if span and span.is_recording():
        span.set_attribute("http.request.url", str(request.url))

FastAPI Integration

# app/main.py
from contextlib import asynccontextmanager
from fastapi import FastAPI
from sqlalchemy.ext.asyncio import create_async_engine
from app.tracing import setup_tracing, instrument_app

engine = create_async_engine("postgresql+asyncpg://...")

@asynccontextmanager
async def lifespan(app: FastAPI):
    # 1. Set up tracing provider FIRST
    setup_tracing(
        service_name="document-api",
        service_version="2.14.0",
        environment="production",
        otlp_endpoint="http://otel-collector:4317",
        sample_rate=0.1,  # Sample 10% of traces in production
    )

    # 2. Instrument the app and its dependencies
    instrument_app(app, engine=engine)

    yield

app = FastAPI(lifespan=lifespan)

docker-compose Setup

# docker-compose.yml
services:
  otel-collector:
    image: otel/opentelemetry-collector-contrib:0.96.0
    ports:
      - "4317:4317"   # OTLP gRPC
      - "4318:4318"   # OTLP HTTP
    volumes:
      - ./config/otel-collector.yaml:/etc/otelcol-contrib/config.yaml

  jaeger:
    image: jaegertracing/all-in-one:1.55
    ports:
      - "16686:16686"  # Jaeger UI
      - "14250:14250"  # Collector gRPC
    environment:
      - COLLECTOR_OTLP_ENABLED=true

# config/otel-collector.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 5s
    send_batch_size: 1000
  memory_limiter:
    check_interval: 1s
    limit_mib: 512

exporters:
  jaeger:
    endpoint: jaeger:14250
    tls:
      insecure: true

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [jaeger]

3. Auto-Instrumentation: What Spans Are Generated

When you apply FastAPIInstrumentor, SQLAlchemyInstrumentor, and HTTPXClientInstrumentor, these are the spans generated automatically for a typical request:

POST /api/documents (FastAPI root span)
│ http.method: POST
│ http.url: /api/documents
│ http.status_code: 200
│ net.peer.ip: 10.0.0.1
│
├── SELECT documents WHERE id=? (SQLAlchemy span)
│   db.system: postgresql
│   db.statement: SELECT documents.id, documents.content_type ...
│   db.name: myapp_prod
│
├── GET document_cache:doc_8f3a (Redis span)
│   db.system: redis
│   db.statement: GET
│   net.peer.name: redis
│   net.peer.port: 6379
│
└── POST https://api.openai.com/v1/embeddings (HTTPX span)
    http.method: POST
    http.url: https://api.openai.com/v1/embeddings
    http.status_code: 200
    http.response_content_length: 4096

This is already extremely useful for root cause analysis - and it requires zero application code changes beyond the setup call.

4. Custom Spans

Auto-instrumentation covers I/O. Your business logic - document parsing, validation, ML pipeline stages - is invisible without custom spans.

Creating Custom Spans

# app/services/document_processor.py
from opentelemetry import trace
from opentelemetry.trace import Status, StatusCode
import structlog

log = structlog.get_logger()
tracer = trace.get_tracer(__name__)  # Module-level tracer


class DocumentProcessor:
    """
    Processes documents through a multi-stage pipeline.
    Each stage gets its own span for individual timing.
    """

    async def process(self, doc_bytes: bytes, filename: str) -> Document:
        # The auto-instrumented FastAPI span is already active.
        # This span becomes a child of it automatically.
        with tracer.start_as_current_span(
            "document.process",
            attributes={
                "document.filename": filename,
                "document.size_bytes": len(doc_bytes),
            },
        ) as span:
            try:
                doc = await self._run_pipeline(doc_bytes, filename, span)
                span.set_status(Status(StatusCode.OK))
                return doc
            except Exception as exc:
                span.set_status(
                    Status(StatusCode.ERROR, description=str(exc))
                )
                span.record_exception(exc)
                raise

    async def _run_pipeline(
        self,
        doc_bytes: bytes,
        filename: str,
        parent_span,
    ) -> Document:
        # Stage 1: Content Type Detection
        with tracer.start_as_current_span("document.detect_content_type") as span:
            content_type = await self._detect_content_type(doc_bytes)
            span.set_attribute("document.content_type", content_type)

        # Stage 2: Text Extraction (most expensive stage)
        with tracer.start_as_current_span("document.extract_text") as span:
            span.set_attribute("document.content_type", content_type)
            text = await self._extract_text(doc_bytes, content_type)
            span.set_attribute("document.char_count", len(text))
            span.set_attribute("document.extraction_engine", "pdfplumber")

        # Stage 3: Chunking
        with tracer.start_as_current_span("document.chunk") as span:
            chunks = await self._chunk_text(text)
            span.set_attribute("document.chunk_count", len(chunks))
            span.set_attribute("document.avg_chunk_size", len(text) // max(len(chunks), 1))

        # Stage 4: Embedding (calls external API - auto-instrumented by HTTPX)
        with tracer.start_as_current_span("document.embed") as span:
            span.set_attribute("document.chunk_count", len(chunks))
            embeddings = await self._embed_chunks(chunks)
            span.set_attribute("embedding.dimensions", len(embeddings[0]) if embeddings else 0)

        # Stage 5: Store
        with tracer.start_as_current_span("document.store") as span:
            doc = await self._store(filename, text, chunks, embeddings, content_type)
            span.set_attribute("document.id", doc.id)
            return doc

    async def _extract_text(self, doc_bytes: bytes, content_type: str) -> str:
        span = trace.get_current_span()

        if content_type == "application/pdf":
            # Add a span event for significant moments within a span
            span.add_event(
                "pdf.open",
                attributes={"size_bytes": len(doc_bytes)},
            )
            text = await self._extract_pdf_text(doc_bytes)
            span.add_event(
                "pdf.extracted",
                attributes={"char_count": len(text)},
            )
        elif content_type == "text/plain":
            text = doc_bytes.decode("utf-8")
        else:
            raise ValueError(f"Unsupported content type: {content_type}")

        return text

Span Attributes vs Span Events

Feature	Span Attributes	Span Events
Purpose	Static properties of the operation	Time-stamped occurrences within the span
Example	`http.method = "POST"`, `db.name = "prod"`	`"pdf page 5 parsed"`, `"cache miss"`
Timestamp	Set at span creation or updated during span	Has its own timestamp within the span
Use for	Characterising the span for filtering	Recording moments within a long operation
Cardinality concern	Yes - high-cardinality attributes hurt backends	Less so - events are per-trace not per-series

# Attributes: static properties
span.set_attribute("model.name", "gpt-4")
span.set_attribute("model.version", "turbo-2024")
span.set_attribute("request.tokens", 1024)

# Events: things that happened during the span
span.add_event("model.called", {"timestamp_iso": "2026-03-07T09:14:32Z"})
span.add_event("model.response.received", {"tokens_used": 837})
span.add_event("rate_limit.hit", {"retry_after_seconds": 5})

5. Context Propagation

Context propagation is how tracing works across services. Without it, each service would start a new, disconnected trace.

HTTP: Automatic with HTTPX

When HTTPXClientInstrumentor is active, every httpx.AsyncClient request automatically injects the traceparent (and tracestate) header:

import httpx
from opentelemetry.instrumentation.httpx import HTTPXClientInstrumentor

# Already instrumented by setup_tracing()
async with httpx.AsyncClient() as client:
    # traceparent header is injected automatically
    response = await client.post(
        "http://ml-service/api/predict",
        json={"text": "classify this"},
    )
    # The ml-service receives traceparent and continues the same trace

The outgoing request will have headers like:

POST /api/predict HTTP/1.1
Host: ml-service
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
tracestate:
Content-Type: application/json

Kafka: Manual Header Injection

Message queues do not have automatic instrumentation for all use cases. You need to manually inject and extract trace context.

# app/messaging/kafka_producer.py
from opentelemetry import trace, propagate
from opentelemetry.propagators.textmap import DefaultTextMapPropagator
import json
from kafka import KafkaProducer

propagator = DefaultTextMapPropagator()
tracer = trace.get_tracer(__name__)
producer = KafkaProducer(bootstrap_servers=["kafka:9092"])

def send_document_event(document_id: str, event_type: str) -> None:
    """
    Send a Kafka message with trace context in headers
    so the consumer can continue the trace.
    """
    with tracer.start_as_current_span(
        "kafka.produce",
        kind=trace.SpanKind.PRODUCER,
        attributes={
            "messaging.system": "kafka",
            "messaging.destination": "document-events",
            "messaging.operation": "send",
        },
    ) as span:
        # Collect the current trace context into a carrier dict
        carrier = {}
        propagate.inject(carrier)  # Fills carrier with traceparent, tracestate

        # Convert to Kafka headers format: list of (key, bytes) tuples
        headers = [
            (key, value.encode("utf-8"))
            for key, value in carrier.items()
        ]

        payload = json.dumps({
            "document_id": document_id,
            "event_type": event_type,
        }).encode("utf-8")

        producer.send(
            "document-events",
            value=payload,
            headers=headers,
        )

        span.set_attribute("messaging.message_id", document_id)


# app/messaging/kafka_consumer.py
from opentelemetry import trace, propagate
from kafka import KafkaConsumer
import json

tracer = trace.get_tracer(__name__)
consumer = KafkaConsumer("document-events", bootstrap_servers=["kafka:9092"])

def consume_messages():
    for message in consumer:
        # Extract trace context from Kafka headers
        carrier = {
            key.decode("utf-8"): value.decode("utf-8")
            for key, value in message.headers
        }
        context = propagate.extract(carrier)

        # Start a span as a child of the producer's span
        with tracer.start_as_current_span(
            "kafka.consume",
            context=context,
            kind=trace.SpanKind.CONSUMER,
            attributes={
                "messaging.system": "kafka",
                "messaging.source": "document-events",
                "messaging.operation": "receive",
            },
        ) as span:
            data = json.loads(message.value)
            span.set_attribute("document.id", data["document_id"])
            process_document_event(data)

The W3C Trace Context Spec

Three headers defined by the W3C Trace Context Level 2 specification:

Header	Format	Example
`traceparent`	`{version}-{traceid}-{parentid}-{flags}`	`00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01`
`tracestate`	Vendor-specific key-value pairs	`dd=s:2;o:rum,congo=t61rcWkgMzE`
`baggage`	RFC 7941 key-value pairs	`userId=alice,serverNode=iad-2,isProduction=false`

OpenTelemetry's DefaultTextMapPropagator supports both traceparent and baggage by default.

6. Baggage

Baggage is a key-value store that propagates with the trace context across service boundaries. Unlike span attributes (which are only visible in that span), baggage values are available to all services in the call chain.

Use cases:

user.tier = "enterprise" - downstream services apply different rate limits
feature.flag = "new_chunker" - all services log which feature flag variant is active
ab.variant = "B" - correlate all spans in a trace to an A/B test variant

# app/middleware/baggage_middleware.py
from opentelemetry import baggage, context
from opentelemetry.baggage.propagation import W3CBaggagePropagator
from starlette.middleware.base import BaseHTTPMiddleware

class BaggageMiddleware(BaseHTTPMiddleware):
    """
    Inject user-tier and feature flags into baggage so
    downstream services can access them without passing them
    as explicit parameters.
    """

    async def dispatch(self, request, call_next):
        # Set baggage values - these propagate to all downstream services
        ctx = baggage.set_baggage("user.tier", "enterprise")
        ctx = baggage.set_baggage(
            "feature.new_chunker",
            "enabled",
            context=ctx,
        )
        token = context.attach(ctx)
        try:
            return await call_next(request)
        finally:
            context.detach(token)


# In any downstream service, read baggage:
user_tier = baggage.get_baggage("user.tier")
if user_tier == "enterprise":
    apply_enterprise_rate_limit()

Baggage Caution

Baggage propagates to all services, including third-party ones. Never put sensitive data (PII, credentials) in baggage. Keep baggage small - it is included in every HTTP request header.

7. Sampling Strategies

In production, you cannot trace every request. At 10,000 requests/second, full tracing generates millions of spans per minute - too expensive to store and too slow to export.

Sampler Types

from opentelemetry.sdk.trace.sampling import (
    ALWAYS_ON,              # Sample everything - dev/testing only
    ALWAYS_OFF,             # Sample nothing - disable tracing
    TraceIdRatioBased,      # Deterministically sample N% of traces by TraceId hash
    ParentBased,            # Defer to upstream's decision; apply own rate for new traces
)

# Development: sample everything
sampler = ALWAYS_ON

# Production: sample 10% of new traces; always continue parent's sampling decision
sampler = ParentBased(root=TraceIdRatioBased(0.10))

# High-traffic service: sample 1%
sampler = ParentBased(root=TraceIdRatioBased(0.01))

Rate-Limiting Sampler

TraceIdRatioBased samples a percentage, but in bursts you might still generate too many traces. A rate-limiting sampler caps traces per second:

# app/tracing/rate_limiting_sampler.py
import time
import threading
from opentelemetry.sdk.trace.sampling import Sampler, SamplingResult, Decision
from opentelemetry.trace import SpanKind
from opentelemetry.trace.span import TraceState
from opentelemetry.util.types import Attributes

class RateLimitingSampler(Sampler):
    """
    Samples at most `max_traces_per_second` traces per second.
    Uses a token bucket algorithm.
    """

    def __init__(self, max_traces_per_second: float = 5.0):
        self._max_traces_per_second = max_traces_per_second
        self._tokens = max_traces_per_second
        self._last_refill = time.monotonic()
        self._lock = threading.Lock()

    def _refill(self):
        now = time.monotonic()
        elapsed = now - self._last_refill
        self._tokens = min(
            self._max_traces_per_second,
            self._tokens + elapsed * self._max_traces_per_second,
        )
        self._last_refill = now

    def should_sample(
        self,
        parent_context,
        trace_id,
        name,
        kind=None,
        attributes=None,
        links=None,
        trace_state=None,
    ) -> SamplingResult:
        with self._lock:
            self._refill()
            if self._tokens >= 1.0:
                self._tokens -= 1.0
                return SamplingResult(
                    decision=Decision.RECORD_AND_SAMPLE,
                    attributes=attributes,
                    trace_state=trace_state or TraceState(),
                )
        return SamplingResult(
            decision=Decision.DROP,
            attributes=None,
            trace_state=trace_state or TraceState(),
        )

    def get_description(self) -> str:
        return f"RateLimitingSampler({self._max_traces_per_second}/s)"

Sampling Strategy Decision Table

Scenario	Recommended Sampler	Sample Rate
Development / local	`ALWAYS_ON`	100%
Staging	`ParentBased(TraceIdRatioBased)`	100%
Production, low traffic (<100 req/s)	`ParentBased(TraceIdRatioBased)`	100%
Production, medium traffic (100–1000 req/s)	`ParentBased(TraceIdRatioBased)`	10%
Production, high traffic (>1000 req/s)	`ParentBased(TraceIdRatioBased)`	1% + `RateLimitingSampler`
Always trace errors	Tail-based sampling (Jaeger Adaptive)	100% errors, 1% successes

Tail-Based Sampling Concept

Head-based sampling (what we have described so far) makes the sampling decision at the beginning of a trace, before you know if it will be slow or errored. This means you might drop 99% of requests and accidentally drop the one slow request you needed.

Tail-based sampling makes the decision after the trace completes. It keeps all error traces and all traces above a latency threshold, and samples the rest. This requires a dedicated component (e.g., OpenTelemetry Collector with the tail_sampling processor):

# config/otel-collector.yaml (tail sampling example)
processors:
  tail_sampling:
    decision_wait: 10s
    num_traces: 100000
    expected_new_traces_per_sec: 1000
    policies:
      - name: errors
        type: status_code
        status_code: {status_codes: [ERROR]}
      - name: slow_traces
        type: latency
        latency: {threshold_ms: 1000}
      - name: probabilistic
        type: probabilistic
        probabilistic: {sampling_percentage: 1}

8. Reading Jaeger

Open http://localhost:16686 to access the Jaeger UI.

Finding a Trace

Search:
  Service: document-api
  Operation: POST /api/documents
  Min Duration: 1s          ← filter for slow traces
  Lookback: Last 1 hour

Reading the Waterfall

A waterfall diagram shows spans as horizontal bars. The key things to look for:

TRACE: abc123  Total: 1,891ms
────────────────────────────────────────────────────────────
POST /api/documents                         ████████████████████████████████ 1,891ms
  document.process                          ████████████████████████████████ 1,878ms
    document.detect_content_type            █  23ms
    document.extract_text                   ████████████  634ms
      [GAP - between extract and chunk]     ████  201ms   ← SUSPICIOUS
    document.chunk                          ███  87ms
    document.embed                          ████████████████  843ms
      POST https://api.openai.com/...       ████████████████  841ms
    document.store                          ██  89ms
      SELECT documents ...                  █  12ms
      INSERT documents ...                  █  43ms

The 201ms gap between extract_text and chunk is visible in the waterfall even though it does not appear in any span - it is time spent in Python code between the two with tracer.start_as_current_span() blocks. This is a key advantage of tracing over logging: the gaps are visible.

The embed span at 843ms is almost entirely consumed by the OpenAI API call (841ms). The fix: batch requests or add caching.

Comparing Two Traces

Jaeger allows selecting two traces and diffing them. This is invaluable for questions like "what's different between a fast and slow request with the same route?"

9. Connecting Traces to Logs

The final step in the observability triad: linking a log line to the trace that caused it.

structlog Processor for Trace IDs

# app/logging/otel_processor.py
from typing import Any
from structlog.types import EventDict

def inject_trace_context(
    logger: Any, method: str, event_dict: EventDict
) -> EventDict:
    """
    Inject the current OpenTelemetry trace ID and span ID into the log record.

    When a log line is viewed in Loki/Kibana, you can click the trace_id
    to jump directly to the corresponding trace in Jaeger.
    """
    try:
        from opentelemetry import trace

        span = trace.get_current_span()
        if span.is_recording():
            ctx = span.get_span_context()
            # Format as 32-char hex for trace ID, 16-char for span ID
            event_dict["trace_id"] = format(ctx.trace_id, "032x")
            event_dict["span_id"] = format(ctx.span_id, "016x")
            # W3C traceparent format for easy correlation
            event_dict["traceparent"] = (
                f"00-{format(ctx.trace_id, '032x')}"
                f"-{format(ctx.span_id, '016x')}"
                f"-{'01' if ctx.trace_flags.sampled else '00'}"
            )
    except ImportError:
        pass

    return event_dict

Add this processor to your structlog configuration (in logging_config.py):

# In setup_logging(), add before the renderer:
structlog.configure(
    processors=[
        structlog.contextvars.merge_contextvars,
        structlog.stdlib.add_log_level,
        structlog.stdlib.add_logger_name,
        structlog.processors.TimeStamper(fmt="iso", utc=True),
        inject_trace_context,          # ← Add this
        structlog.processors.format_exc_info,
        mask_sensitive_data,
        structlog.processors.JSONRenderer(),
    ],
    ...
)

Log Lines with Trace IDs

Every log line now carries trace_id and span_id:

{
  "timestamp": "2026-03-07T09:14:33.891Z",
  "level": "error",
  "event": "document.extract_text.failed",
  "filename": "report.pdf",
  "error": "PDFSyntaxError: EOF marker not found",
  "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
  "span_id": "00f067aa0ba902b7",
  "traceparent": "00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01",
  "request_id": "req_7e9d3b",
  "service": "document-api"
}

Grafana: From Log to Trace in One Click

In Grafana, when you configure Loki as a data source and Jaeger as a trace data source, you can link them:

// In the Loki datasource configuration (Grafana provisioning)
{
  "name": "Loki",
  "type": "loki",
  "url": "http://loki:3100",
  "jsonData": {
    "derivedFields": [
      {
        "matcherRegex": "\"trace_id\":\"(\\w+)\"",
        "name": "TraceID",
        "url": "${__value.raw}",
        "datasourceUid": "jaeger-uid",
        "urlDisplayLabel": "View Trace in Jaeger"
      }
    ]
  }
}

Now when you view a log line in Grafana Loki that has a trace_id, a "View Trace in Jaeger" button appears inline. One click takes you from the log line to the full distributed trace in Jaeger.

Complete OpenTelemetry Integration Test

# tests/test_tracing.py
import pytest
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export.in_memory_span_exporter import InMemorySpanExporter
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from opentelemetry import trace


@pytest.fixture
def trace_exporter():
    """
    In-memory span exporter for testing.
    Captures all spans created during the test.
    """
    exporter = InMemorySpanExporter()
    provider = TracerProvider()
    provider.add_span_processor(SimpleSpanProcessor(exporter))
    trace.set_tracer_provider(provider)
    yield exporter
    exporter.clear()


def test_document_processor_creates_spans(trace_exporter):
    """Verify the document processor creates the expected span hierarchy."""
    from app.services.document_processor import DocumentProcessor
    import asyncio

    processor = DocumentProcessor()
    asyncio.run(processor.process(b"test content", "test.txt"))

    spans = trace_exporter.get_finished_spans()
    span_names = [s.name for s in spans]

    assert "document.process" in span_names
    assert "document.detect_content_type" in span_names
    assert "document.extract_text" in span_names
    assert "document.chunk" in span_names
    assert "document.store" in span_names

    # Verify parent-child relationships
    root_span = next(s for s in spans if s.name == "document.process")
    child_spans = [
        s for s in spans
        if s.parent and s.parent.span_id == root_span.context.span_id
    ]
    assert len(child_spans) >= 4


def test_error_span_has_error_status(trace_exporter):
    """Verify that exceptions set span status to ERROR."""
    from app.services.document_processor import DocumentProcessor
    import asyncio

    processor = DocumentProcessor()
    with pytest.raises(ValueError):
        asyncio.run(processor.process(b"", "corrupted.pdf"))

    spans = trace_exporter.get_finished_spans()
    root_span = next(s for s in spans if s.name == "document.process")

    from opentelemetry.trace import StatusCode
    assert root_span.status.status_code == StatusCode.ERROR

Interview Questions and Answers

Q1: A distributed trace shows a total duration of 2 seconds, but the sum of all individual spans is only 1.2 seconds. Is this a bug in the tracing instrumentation?

No - this is expected and represents uninstrumented time. The "missing" 800ms is time the application spent in code paths that do not have spans: Python interpreter overhead, garbage collection pauses, context switches, time between with tracer.start_as_current_span() blocks, and library code that is not instrumented. This gap is actually one of the most valuable signals tracing provides - it tells you where you should add more instrumentation. Look for the largest gaps between consecutive sibling spans and add custom spans there.

Q2: You set sample_rate=0.01 (1% sampling). A critical bug causes errors on 5 requests per second. How many error traces do you capture?

With TraceIdRatioBased(0.01), only 1% of traces are kept - including error traces. If errors occur at 5/sec, you capture approximately 0.05 error traces per second, or about 3 per minute. This is the key weakness of head-based sampling. The solution is tail-based sampling (using the OTel Collector's tail_sampling processor), which makes the sampling decision after the trace completes and can apply a policy like "always keep error traces, sample 1% of success traces." Alternatively, many teams combine both: ParentBased(TraceIdRatioBased(0.01)) for normal traffic, plus a separate error rate alert in Prometheus that fires without tracing data.

Q3: Two developers disagree about whether to put user_id in span attributes or in baggage. Who is right?

Both approaches have valid use cases. Span attributes attach the value to a specific span - it is visible in the Jaeger trace for that service only. Baggage propagates the value to all downstream services in the trace, without each service needing to explicitly pass it. If user_id should be visible in every span across all services (e.g., for security auditing or per-user SLOs), put it in baggage. If user_id is only relevant to the service that has it (e.g., the authentication service), put it only in span attributes. The caution with baggage: it adds to every HTTP request header, and it is visible to all downstream services including third-party ones, so sensitive data should never go in baggage.

Q4: How does ParentBased(root=TraceIdRatioBased(0.10)) behave differently from TraceIdRatioBased(0.10) alone?

TraceIdRatioBased(0.10) makes its own sampling decision independently for every span, ignoring the upstream service's decision. If the upstream service sampled the trace (flagged in traceparent), but TraceIdRatioBased decides not to sample this service's span, the trace will be broken - you lose the downstream portion. ParentBased wraps the sampler: if there is a sampled parent context (upstream said "sample this"), ParentBased always continues sampling. If there is an unsampled parent context, ParentBased always drops. Only for root spans (no parent) does it delegate to the wrapped sampler (TraceIdRatioBased(0.10)). This ensures trace continuity: once a trace is sampled at the entry point, it stays sampled through all downstream services.

Q5: You have a Python service that processes Kafka messages. Each message processing starts a new trace. After six months, you realise you cannot correlate message processing traces with the API request traces that produced the messages. How do you fix this going forward?

The fix is trace context injection at produce time and extraction at consume time, as shown in this lesson. The producer injects the current span's context into Kafka message headers (traceparent, tracestate). The consumer extracts that context and starts its span with context=extracted_context. The consumer span then appears as a child of the producer span in the trace, even though they ran asynchronously at different times. In Jaeger, the trace shows the full causal chain: HTTP request → Kafka produce → (async gap) → Kafka consume → processing pipeline. For existing messages that were produced without trace headers, you can only add this going forward - you cannot retroactively link them.

What You Will Learn​

Prerequisites​

The Incident: 800ms With No Visible Cause​

1. Tracing Concepts​

Trace​

Span​

The Timing Diagram​

The traceparent Header​

2. OpenTelemetry Python Setup​

Architecture​

Full Setup Module​

FastAPI Integration​

docker-compose Setup​

3. Auto-Instrumentation: What Spans Are Generated​

4. Custom Spans​

Creating Custom Spans​

Span Attributes vs Span Events​

5. Context Propagation​

HTTP: Automatic with HTTPX​

Kafka: Manual Header Injection​

The W3C Trace Context Spec​

6. Baggage​

Baggage Caution​

7. Sampling Strategies​

Sampler Types​

Rate-Limiting Sampler​

Sampling Strategy Decision Table​

Tail-Based Sampling Concept​

8. Reading Jaeger​

Finding a Trace​

Reading the Waterfall​

Comparing Two Traces​

9. Connecting Traces to Logs​

structlog Processor for Trace IDs​

Log Lines with Trace IDs​

Grafana: From Log to Trace in One Click​

Complete OpenTelemetry Integration Test​

Interview Questions and Answers​